Author

Aldwin Arambulo (SID 510575452) | Mia Childers (SID 510414892) | Mu-Wei Chung (SID 510611635)

Published

July 24, 2024

This report investigates the impacts of vaccinations amongst the different development regions as we dived into the difference of life expectancy and the global coverage of the vaccination programs. Overall, it aims to act as a resource for NGOs in prioritising what vaccines to distribute, and where, to best increase life expectancy. We used a dataset from OWID (Our World In Data) for vaccination coverage and global life expectancy. We analysed the trend of the vaccination programs in different regions to deduce the impact of the vaccines and potential life expectancy gains from 1980 to 2021.

We have split up our findings in 2 sections:

  • Impact of Vaccines: looks into how the different vaccines on different diseases perform in prevention and increasing the global life expectancy.

  • Modelling and Predicting future life expectancy: looks into the different trends and future projections of life expectancy based on the global vaccination coverage.

We leveraged different methods such as Pearson’s correlation coefficient and linear regression to find the impacts of different vaccinations, and ARIMA, difference of time periods, and ACF and PACF plots to analyse the trends of the time series data on how life expectancy based on the global vaccination coverage.

Life Expectancy Data

Life Expectancy dataset from Our World in Data.

Code
le_df <- read_csv("data/life-expectancy.csv")
paged_table(le_df)

Cleaning life expectancy data

Code
le_clean <- janitor::clean_names(le_df)
# rename column
le_clean <- le_clean %>% 
  rename(period_life_expect = period_life_expectancy_at_birth_sex_all_age_0)
paged_table(le_clean)

Global Vaccination Coverage Data

Code
vaccination_df <- read_csv("data/global-vaccination-coverage.csv")
vaccination_df %>% paged_table()

Cleaning vaccination data

Code
vaccination_clean <- janitor::clean_names(vaccination_df) %>% paged_table()
vaccination_clean

Using a combination of Pearson’s correlation coefficient and linear regression, the vaccines for one dose of Salk’s polio vaccine, one dose of the meningococcal vaccine, three doses of the polio vaccine and three doses of the DPT (diphtheria, pertussis, tetanus) vaccine were found to be the most significant in increasing life expectancy globally.

Code
merged_datasets = merge(vaccination_clean, le_clean, by=c("entity","year"))
names(merged_datasets)[names(merged_datasets) == 'period_life_expect'] <- 'life_expectancy'
Code
ggplot(data=merged_datasets) +
  geom_point(aes(
    x=ipv1_percent_of_one_year_olds_immunized, y=life_expectancy, color=year)) +
  theme_bw() +
  labs(title="Salk (Polio) 1st Dose") + xlab("% 1yos vaccinated") + ylab("Life Expectancy")

Code
ggplot(data=merged_datasets) +
  geom_point(aes(
    x=pol3_percent_of_one_year_olds_immunized, y=life_expectancy, color=year)) +
  theme_bw() +
  labs(title="Polio 3rd Dose") + xlab("% 1yos vaccinated") + ylab("Life Expectancy")

Code
ggplot(data=merged_datasets) +
  geom_point(aes(
    x=mcv1_percent_of_one_year_olds_immunized, y=life_expectancy, color=year)) +
  theme_bw() +
  labs(title="Meningococcal 1st Dose") + xlab("% 1yos vaccinated") + ylab("Life Expectancy")

Code
ggplot(data=merged_datasets) +
  geom_point(aes(
    x=dtp3_percent_of_one_year_olds_immunized, y=life_expectancy, color=year)) +
  theme_bw() +
  labs(title="Diphtheria 3rd Dose") + xlab("% 1yos vaccinated") + ylab("Life Expectancy")

In this section, we will perform a time series analysis on the life expectancy data for three UN development groups: least developed countries, less developed regions excluding the least developed countries, and more developed regions. The reason for this is due to the fact that the life expectancy are very different between these regions.

Code
development_status <- stringr::str_subset(le_clean$entity, regex("devel", ignore_case = TRUE)) %>% unique()

development_status <- c("Least developed countries", "Less developed regions, excluding least developed countries", "More developed regions")

p <- le_clean %>%
  filter(entity %in% development_status) %>%
  ggplot(aes(x = year, 
             y = period_life_expect,
             color = entity)) +
  geom_line() + 
  labs(title = "Life Expectancy by Year",
       x = "Year",
       y = "Life Expectancy at Birth (years)")
ggplotly(p)

Building ARIMA model

Now, we use auto.arima to fit best ARIMA model for each group using life expectancy data from 1950 onward, and then forecast the life expectancy for the next 5 years.

Convert dataframes to time series

Code
# create a list of 3 dataframes
le_dev_ls <- list()
for (dev in development_status) {
  le_dev_ls[[dev]] <- le_clean %>% 
    filter(entity == dev) %>%
    select(period_life_expect) %>%
    ts(start = 1950, end = 2021)
}
# le_dev_ls

Fit best ARIMA models

Code
# install.packages("forecast")
library(forecast)
le_dev_opt <- list()
for (dev in development_status) {
  le_dev_opt[[dev]] <- auto.arima(le_dev_ls[[dev]])
}
le_dev_opt
$`Least developed countries`
Series: le_dev_ls[[dev]] 
ARIMA(0,1,2) with drift 

Coefficients:
          ma1      ma2   drift
      -0.4223  -0.1736  0.3957
s.e.   0.1165   0.1071  0.0477

sigma^2 = 0.9743:  log likelihood = -98.47
AIC=204.94   AICc=205.54   BIC=213.99

$`Less developed regions, excluding least developed countries`
Series: le_dev_ls[[dev]] 
ARIMA(2,1,0) with drift 

Coefficients:
         ar1      ar2   drift
      0.5187  -0.5537  0.4077
s.e.  0.1007   0.1005  0.0741

sigma^2 = 0.4287:  log likelihood = -69.57
AIC=147.13   AICc=147.74   BIC=156.18

$`More developed regions`
Series: le_dev_ls[[dev]] 
ARIMA(0,2,1) 

Coefficients:
          ma1
      -0.7382
s.e.   0.0909

sigma^2 = 0.09376:  log likelihood = -16.33
AIC=36.65   AICc=36.83   BIC=41.15

Validate ARIMA models

The auto.arima function found the best ARIMA model for each group as follows:

  • Least developed countries: ARIMA(0,1,2)
  • Less developed regions, excluding least developed countries: ARIMA(2,1,0)
  • More developed regions: ARIMA(0,2,1)

We can also use the sarima function to check if the residuals are white noise (no tread), the ACF of residuals are not significant, p-values of Ljung-Box test are greater than 0.05.

Code
library(astsa)

sarima(le_dev_ls$`Least developed countries`, 0, 1, 2)
initial  value 0.069482 
iter   2 value -0.018455
iter   3 value -0.031497
iter   4 value -0.032603
iter   5 value -0.034478
iter   6 value -0.034615
iter   7 value -0.034655
iter   8 value -0.034655
iter   9 value -0.034655
iter  10 value -0.034655
iter  10 value -0.034655
iter  10 value -0.034655
final  value -0.034655 
converged
initial  value -0.032011 
iter   2 value -0.032051
iter   3 value -0.032054
iter   4 value -0.032057
iter   5 value -0.032062
iter   6 value -0.032062
iter   6 value -0.032062
iter   6 value -0.032062
final  value -0.032062 
converged
<><><><><><><><><><><><><><>
 
Coefficients: 
         Estimate     SE t.value p.value
ma1       -0.4223 0.1165 -3.6248  0.0006
ma2       -0.1736 0.1071 -1.6204  0.1098
constant   0.3957 0.0477  8.3012  0.0000

sigma^2 estimated as 0.9330945 on 68 degrees of freedom 
 
AIC = 2.886429  AICc = 2.891474  BIC = 3.013904 
 

Code
sarima(le_dev_ls$`Less developed regions, excluding least developed countries`, 2, 1, 0)
initial  value -0.199607 
iter   2 value -0.401779
iter   3 value -0.426091
iter   4 value -0.437383
iter   5 value -0.437806
iter   6 value -0.438451
iter   7 value -0.438459
iter   8 value -0.438462
iter   9 value -0.438462
iter  10 value -0.438462
iter  10 value -0.438462
iter  10 value -0.438462
final  value -0.438462 
converged
initial  value -0.439003 
iter   2 value -0.439076
iter   3 value -0.439123
iter   4 value -0.439125
iter   5 value -0.439126
iter   5 value -0.439126
iter   5 value -0.439126
final  value -0.439126 
converged
<><><><><><><><><><><><><><>
 
Coefficients: 
         Estimate     SE t.value p.value
ar1        0.5187 0.1007  5.1526       0
ar2       -0.5537 0.1005 -5.5117       0
constant   0.4077 0.0741  5.5020       0

sigma^2 estimated as 0.4105617 on 68 degrees of freedom 
 
AIC = 2.072301  AICc = 2.077346  BIC = 2.199776 
 

Code
sarima(le_dev_ls$`More developed regions`, 0, 2, 1)
initial  value -0.939558 
iter   2 value -1.154217
iter   3 value -1.164176
iter   4 value -1.170757
iter   5 value -1.170866
iter   6 value -1.170875
iter   6 value -1.170875
iter   6 value -1.170875
final  value -1.170875 
converged
initial  value -1.185666 
iter   2 value -1.185691
iter   3 value -1.185715
iter   3 value -1.185715
iter   3 value -1.185715
final  value -1.185715 
converged
<><><><><><><><><><><><><><>
 
Coefficients: 
    Estimate     SE t.value p.value
ma1  -0.7382 0.0909 -8.1186       0

sigma^2 estimated as 0.09230321 on 69 degrees of freedom 
 
AIC = 0.5235895  AICc = 0.5244299  BIC = 0.5878322 
 

Forecast the life expectancy for the next 5 years

According to the ARIMA model, the life expectancy of least developed countries and less developed regions are expected to increase in the next 5 years whereas that of more developed regions is expected to decrease. However, the downside of this model is that it assume life expectancy of this year is only dependent on life expectancy in previous years, and does not take into account the external effect such as COVID pandemic since 2019.

Code
p1 <- le_dev_opt$`Least developed countries` %>% 
  forecast(h = 5) %>% 
  autoplot() +
  ggtitle("5 Years Forecast for Least Developed Countries") +
  xlab("Year") +
  ylab("Life Expectancy")

p2 <- le_dev_opt$`Less developed regions, excluding least developed countries` %>%
  forecast(h = 5) %>% 
  autoplot() +
  ggtitle("5 Years Forecast for Less Developed Regions (Excluding Least Developed Countries)") +
  xlab("Year") +
  ylab("Life Expectancy")

p3 <- le_dev_opt$`More developed regions` %>%
  forecast(h = 5) %>% 
  autoplot() +
  ggtitle("5 Years Forecast for More Developed Regions") +
  xlab("Year") +
  ylab("Life Expectancy")

p1

Code
p2

Code
p3

References

Dattani, S., Rodés-Guirao, L., Ritchie, H., Ortiz-Ospina, E., & Roser, M. (2023). Life Expectancy. Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/life-expectancy)

Immunization coverage. (n.d.). Retrieved 22 July 2024, from (https://www.who.int/news-room/fact-sheets/detail/immunization-coverage)

WHO, & UNICEF. (n.d.). Vaccination Coverage. Prepared by Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/grapher/global-vaccination-coverage)

Roser, M. (2023). The rise of maximum life expectancy, Our World in Data. Retrieved 22 July 2024, from (https://ourworldindata.org/the-rise-of-maximum-life-expectancy)